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Abstract 

Differential privacy is a rigorous privacy condition achieved by randomizing query answers. This paper 
develops efficient algorithms for answering multiple queries under differential privacy with low error. We 
pursue this goal by advancing a recent approach called the matrix mechanism, which generalizes standard 
differentially private mechanisms. This new mechanism works by first answering a different set of queries 
(a strategy) and then inferring the answers to the desired workload of queries. Although a few strategies 
are known to work well on specific workloads, finding the strategy which minimizes error on an arbitrary 
workload is intractable. We prove a new lower bound on the optimal error of this mechanism, and we 
propose an efficient algorithm that approaches this bound for a wide range of workloads. 

1 Introduction 

Differential privacy [4] is a rigorous privacy condition, guaranteeing participants that the information released 
about the data will be virtually indistinguishable whether or not their personal data is included. Differential 
privacy is achieved by randomizing query answers. While there are a number of general-purpose mechanisms 
for satisfying differential privacy [2], there are comparatively few results showing that these mechanisms are 
optimally accurate — that is, that the least possible distortion has been introduced to satisfy the privacy cri- 
terion. For a single numerical query, the addition of appropriately-scaled Laplace noise satisfies e-differential 
privacy and has been proven optimally accurate [5]. For workloads of multiple queries, optimally accurate 
mechanisms are not known. 

Our focus is on batch query answering, in which multiple queries are answered at one time, in a single 
interaction with the private server. A batch of queries, or a workload, here consists of a set of linear 
counting queries. These include predicate counting queries, histograms, sets of marginals, data cubes, or any 
combination of these. 

The goal of research in this area is to devise an efficient algorithm that can achieve the least possible 
error under differential privacy. In this work we pursue this goal by advancing a recently-proposed technique 
called the matrix mechanism [10]. The matrix mechanism generalizes standard differentially private output 
perturbation techniques, and we explain it by comparing it with the standard Laplace mechanism. 

The Laplace mechanism answers a workload of queries by adding to each query a sample chosen indepen- 
dently at random from a Laplace distribution. The distribution is scaled to the sensitivity of the workload 
(the maximum possible change to the query answers induced by the addition or removal of one tuple). Con- 
sider using the Laplace mechanism to simultaneously release answers to the set of all range-count queries 
over a database containing ages for a community. This workload consists of all queries AgeCount(a, b) which 
return the number of individuals whose age is between a and b, for any constants a, b e {1, . . . 120}. 

Using the Laplace mechanism directly results in extremely noisy query answers for this workload because 
the noise added to each query in the workload is proportional to the sensitivity of the workload. The sensi- 
tivity of this workload is 0(n 2 ) where n is the size of the domain (120 in this example). In addition, because 
independent noise is added to each query, the answers are inconsistent: e.g. the sum of AgeCount(20, 40) 
and AgeCount(41, 60) will not, in general, equal AgeCount{2§, 60) as one might hope. 
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An alternative to the direct application of the Laplace mechanism arises from recognizing that the 
workload of all range-count queries can be computed from a smaller set of query answers — namely counts for 
each individual age 1,2,... 120. Because each count is independent, this approach has very low sensitivity and 
does not introduce inconsistency. However, the released query results must be summed to get the answers 
to the desired range queries: e.g. AgeCount(l, 10) is the sum AgeCount(l, 1) + • • • + AgeCount(10, 10). 
Although each individual count has low error, the expected error accumulates when computing the sum, 
leading to significant error for large ranges. 

The matrix mechanism (so named because workloads are represented as matrices and analyzed alge- 
braically) can be seen as a generalization of the multi-query Laplace mechanism. The approaches above can 
be seen as two extremes: one in which the workload is submitted to the Laplace mechanism, and one in 
which the workload is divided into independent queries and those are submitted. The matrix mechanism 
encompasses both of these extremes, along with many other approaches, some offering significantly lower 
error. 

Given a workload of queries, the matrix mechanism uses the Laplace mechanism to answer a different set 
of queries, called a strategy. The answers to the strategy queries are then used to derive the query answers 
ultimately desired — the workload queries. If there are related queries in the strategy, linear regression is used 
to combine the evidence from all available query answers (in an optimal way) to produce the final answers. 
The result is a consistent final answer to the workload queries, often with improved error. The matrix 
mechanism can improve error because the strategy queries can be used to avoid or reduce redundancy that 
may be present in the workload, thus lowering sensitivity. Also, redundancy that does exist in the strategy 
queries is exploited by the linear regression process to improve accuracy. 

Using the matrix mechanism requires instantiating it with a set of strategy queries which is a good 
match for the workload. For the workload of all range queries, two approaches were independently proposed 
recently, one based on a wavelet transform [13], and one based on a hierarchical query set [9]. While these 
approaches look quite different at first glance, they are in fact different sets of strategy queries which can be 
analyzed as instances of the matrix mechanism [10]. Both strategies result in much lower error: 0(log 3 n) 
instead of 0(n 4 ) for the workload itself or 0(n) for the approach that asks for individual counts. 

Thus research to date has shown that for a particular workload (the set of all range-count queries) there 
are strategies that offer significantly lower error. But these strategies do not work well for all workloads. To 
exploit the full power of the matrix mechanism, we must customize strategies to the given workload. 

Unfortunately, we cannot hope for exact solutions to this problem. The strategy design problem, i.e., cal- 
culating the strategy that results in the minimal error for a given workload, is intractable [10]. Nevertheless, 
in this paper we provide efficient algorithms for computing a set of strategy queries for a workload and show 
that they approach optimally accurate strategies. 

A key aspect of our approach is the move from standard e-differential privacy, to (e, ^-differential privacy, 
a modest relaxation of the privacy condition often called approximate differential privacy. The matrix 
mechanism is easily adapted to approximate differential privacy. Computing the strategy with minimal error 
remains computationally infeasible. But we show that under this definition the matrix mechanism has a 
number of nice features which make it amenable to analysis and which lead to better approximate solutions. 

Our contributions are organized as follows. The central theoretical result, shown in Sec. 3, is a tight 
lower bound on the minimum total error achievable for a workload. It can be efficiently computed from the 
spectral properties of a workload when it is represented in matrix form. In Sec. 4, we propose an efficient 
localized search algorithm for solving the strategy design problem. In Sec 5, we show experimentally that, 
for a variety of workloads, our algorithm produces strategies that approach the optimal. 

We use our bound on error to understand workload complexity — that is, the relative difficulty of simul- 
taneously answering a workload of queries accurately. For instance, sets of multi-dimensional range queries 
are "easier" to answer than one dimensional range queries. We also use the error bound to evaluate the 
quality of existing approaches. The hierarchical and wavelet strategies mentioned above are fairly close to 
optimal in some cases, but can be significantly improved in others. For example, for two-dimensional range 
queries, the wavelet strategy results in error 2.2 times the optimal while our algorithm finds a strategy that 
is just 10% greater than the optimal. Ultimately, we conclude that adapting the matrix mechanism to the 
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specific properties of a workload is crucial: there are workloads for which the total error achieved using our 
algorithm is an order of magnitude less than that of existing techniques. 

2 Background 

In this section we describe our data model and privacy conditions. We also review the fundamentals of the 
matrix mechanism, including error measurement and our main problem of strategy design. 

2.1 Data model and linear queries 

The database / is an instance of a single-relation schema i?(A), with attributes A = {A\, A2, . . . , A m }. The 
crossproduct of the attribute domains, written dom(A), is the set of all possible tuples. In order to express 
our queries, we first construct from I a vector x of cell counts. Each element Xi of x is associated with 
cell(xi), a disjoint subset of dom(A). Then X{ is the count of the tuples from cell(xi) that are present in /: 
Xi = \{t e I\t e cell(xi)}\. All queries are expressed using the cell counts in x. We always use n for the size 
of x, which we sometimes refer to simply as the domain size. 

One way to define the vector x is to choose the smallest possible cells: one cell for each tuple in dom(A). 
This is often inefficient (the size of the x vector is the product of the attribute domain sizes) and ineffective 
(the base counts are typically too small to be estimated accurately under the privacy condition). Instead, we 
often consider queries over larger cells. A common way to form a vector of base counts is to partition each 
dom(Ai) into di regions, which could correspond to ranges over an ordered domain, or individual elements 
(or sets of elements) in a categorical domain. Then the individual cells arc defined by taking the cross- 
product of the regions in each domain. Other alternatives are possible as the cells need not be formed from 
contiguous subsets of the attribute domains as long as all cells are disjoint. Constructing a vector of base 
counts appropriate to a workload is usually straightforward but we provide a practical example in App. A 
to aid the reader. 

A linear query computes a specified linear combination of the cell counts in x. 

Definition 2.1 (Linear query). A linear query is a length-n row vector q = [q\ . . . q n ] with each ?i€R. The 
answer to a linear query q on x is the vector product qx = qiXi + • • • + q n x n . 

A set of queries is represented as a matrix, each row of which is a single linear query. 

Definition 2.2 (Query matrix). A query matrix is a collection of m linear queries, arranged by rows to 
form an m x n matrix. 

If W is an m x n query matrix, the query answer for W is a length m column vector of query results, 
which can be computed as the matrix product Wx. 

A workload is a query matrix representing a batch of linear queries of interest to a user. We introduce 
notation for two common workloads which contain all queries of a certain type. AllRange^i, . . . dk) refers 
to the set of all multi-dimensional range-count queries over k ordered domains, where each is divided into 
di regions. Here the size of the cell count vector, n, is the product n»=i ALLPREDlCATE(n) is the set of 
all predicate counting queries over n cells. AllPredicate(tj) contains all 2™ linear queries of size n with 
coefficients of or 1. We also consider workloads consisting of arbitrary subsets of each of these types of 
queries, low-order marginals, and their combinations. 

2.2 Privacy definitions and mechanisms 

Standard e-differential privacy [4] places a bound (controlled by e) on the difference in the probability of query 
answers for any two neighboring databases. For database instance /, we denote by nbrs(I) the set of databases 
differing from / in at most one record. Approximate differential privacy [3, 11], is a modest relaxation in 
which the e bound on query answer probabilities may be violated with small probability (controlled by <5). 
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Definition 2.3 (Approximate Differential Privacy). A randomized algorithm K, is (e, S) -differentially private 
if for any instance I, any I' € nbrs(I), and any subset of outputs S C Range(K-), the following holds: 

Pr[JC{I) eS}< exp(e) x Pr [£(/') G 5] + 6 

Both definitions can be satisfied by adding random noise to query answers. The magnitude of the required 
noise is determined by the sensitivity of a set of queries: the maximum change in a vector of query answers 
over any two neighboring databases. However, the two privacy definitions differ in the measurement of 
sensitivity and in their noise distributions. Standard differential privacy can be achieved by adding Laplace 
noise calibrated to the L\ sensitivity of the queries [4]. Approximate differential privacy can be achieved 
by adding Gaussian noise calibrated to the L 2 sensitivity of the queries [3, 11]. This small difference in 
the sensitivity metric — from L\ to L 2 — has important consequences for our algorithms and our theoretical 
results. Our results focus on approximate differential privacy, but App. B contains a thorough comparison 
of these two definitions as they pertain to the matrix mechanism and the results of this paper. 

Since our query workloads are represented as matrices, we express the sensitivity of a workload as a 
matrix norm. Because neighboring databases / and I' differ in exactly one tuple, and because cells are 
disjoint, it follows that the corresponding vectors x and x' differ in exactly one component, by exactly one, 
in which case we write x' £ nbrs(x). The L 2 sensitivity of W is equal to the maximum Li norm of the 
columns of W. Below, cols(W) is the set of column vectors Wi of W. 

Proposition 2.1 (L 2 Query matrix sensitivity). The L 2 sensitivity of a query matrix W is denoted ||W|| 2 , 
defined as follows: 

||W|| 2 d = max ||Wx-Wx'|| 2 = max ||W;|| 2 

x'Gnbrs(x) WiEColsCW) 

The classic differentially private mechanism adds independent noise calibrated to the sensitivity of a 
query workload. We use Normal(<7) m to denote a column vector consisting of m independent samples drawn 
from a Gaussian distribution with mean and scale a. 

Proposition 2.2. (Gaussian mechanism [3, 11]) Given an m x n query matrix W, the randomized 
algorithm Q that outputs the following vector is (e, 6)- differentially private: 

3(W,x) = Wx + Normal(cr) m 

where a = ||W|| 2v /21n(2/<5)/e 

Recall that Wx is a vector of the true answers to each query in W. The algorithm above adds independent 
Gaussian noise (scaled by the sensitivity of W, e, and 6) to each query answer. Thus Q(W, x) is a length-m 
column vector containing a noisy answer for each linear query in W. 

The matrix mechanism has a similar form as the algorithm above, but adds a more complex noise vector. 
It uses a different set of queries (the strategy matrix, A) to construct this vector. 

Proposition 2.3. ((e, (5)-Matrix Mechanism [10]) Given an m x n query matrix W, and assuming A 
is a full rank p x n strategy matrix, the randomized algorithm Ma that outputs the following vector is 
(e, 5) -differentially private: 

M A (W,x) = Wx + WA + Normal(a) m . 

where a = ||A|| 2A /21ii(2/(J)/e 

Here A+ is the pseudo-inverse of A: A+ = (A T A) 1 A T ; if A is a square, then A+ is just the inverse of 
A. The intuitive justification for this mechanism is that it is equivalent to the following three-step process: 
(1) the queries in the strategy are submitted to the Gaussian mechanism; (2) an estimate x for x is derived 
by computing the x that minimizes the squared sum of errors (this step consists of standard linear regression 
and requires that A be full rank to ensure a unique solution); (3) noisy answers to the workload queries 
are then computed as Wx. The answers to W derived in step (3) are always consistent because they are 
computed from a single noisy version of the cell counts, x. 
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Like the Gaussian mechanism, the matrix mechanism computes the true answer vector Wx and adds 
noise to each component. But a key difference is that the scale of the Gaussian noise is calibrated to the 
sensitivity of the strategy matrix A, not that of the workload. In addition, the noise added to query answers 
is no longer independent, because the vector of independent Gaussian samples is transformed by the matrix 
WA+. 

Example 2.1. The full rank strategy matrix with least sensitivity is the identity matrix, I, which has sen- 
sitivity 1. With A = 1, the matrix mechanism formalizes the approach mentioned in Sec. 1 which computes 
individual counts and sums them to answer range queries. At the other extreme, the workload itself can be 
used as the strategy: A = W. In this case, there is no benefit in sensitivity over the Gaussian mechanism. 
For many workloads, neither of these naive strategies offer optimal error. For W = ALLRANGE(n) , two 
strategies were recently proposed. The hierarchical strategy includes the total sum over the whole domain, the 
count of each half of the domain, and so on, terminating with counts of individual elements of the domain. 
The wavelet strategy consists of the matrix describing the Haar wavelet transformation. Informally, these 
achieve low error because they each have low sensitivity, 0{logn), and every range query can be expressed as 
a linear combination of few strategy queries. 

2.3 Error of the Matrix Mechanism 

We measure the accuracy of differentially private query answers using mean squared error. For a workload 
of queries, the error is defined as the total of individual query errors. 

Definition 2.4 (Query and Workload Error) . Let w be the estimate for query w under the matrix mechanism 
using query strategy A. That is, w = A4a(w, x). The mean squared error of the estimate for w using strategy 
A is: 

ErroRa(w) = E[(wx-w) 2 ]. 

Given a workload W, the total mean squared error of answering W using strategy A is: ErroRa(W) = 
Ew.ew Error a (w,). 

The query answers returned by the matrix mechanism are linear combinations of noisy strategy query 
answers to which independent Gaussian noise has been added. Thus, as the following proposition shows, we 
can directly compute the error for any linear query w or workload W: 

Proposition 2.4. (Total Error) Given a workload W, the total error of answering W using the (e, 5) 
matrix mechanism with query strategy A is: 

ErroRa(W) = P(e, J)||A||| tmce(W T W(A T A)- 1 ) (1) 

where P(e,S) = 2 y . 

The trace is the sum of the elements on the main diagonal of a square matrix. Our main objective is 
to minimize this formula, which is determined by the relationship between A and W. Note that x (the 
vector of cell counts corresponding to the database) does not appear in this expression. The minimum error 
strategy depends on the workload alone, not on the database instance. 

2.4 The strategy design problem 

The optimal strategy for a workload W is defined to be one that minimizes the total error under the 
(e, <5)-matrix mechanism. 

Problem 2.1. (Minimizing Total Error) Given a workload W, find a query strategy A such that: 

ErroRa (W) = argmin A ERRORA(W). (2) 

The exact solution to Problem 2.1 can be computed using a semi-definite program (SDP) [10]. However, 
finding the solutions of the program with standard SDP solvers takes 0(n s ) time, making it infeasible for 
common applications. Therefore, our goal in this paper is to efficiently find approximations to the optimal 
strategy, for any provided workload. 
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3 Theoretical Analysis 



In this section we present novel theoretical results which provide a foundation for the algorithms that come 
later, and which aid in evaluating both the quality of strategy matrices and the complexity of workloads. 

3.1 Analysis of Strategies 

An analysis of strategies allows us to characterize those which are possible solutions to the strategy design 
problem. 

Definition 3.1 (Partial Order on Strategies). The following relation defines a partial order on strategies. 
Strategy Equivalence: Two strategy matrices A\ and A 2 are equivalent (A\ = A 2 ) if for all linear queries 

q, 

ERRORAi (l) = ERRORA 2 (q)- 
Strategy Efficiency: Strategy Ai is more efficient than A 2 , written A x < A 2 , if for all linear queries q, 

ERRORAi(q) < ERRORA 2 (q). 

Ai is strictly more efficient than A 2 if for all queries q, ErroRaj (q) < ErroRa 2 (q) • 

A strategy can never be the solution to the strategy design problem (Problem 2.1) if there is another 
strategy that is more efficient than it. We call a strategy minimal if it is a minimal element in the partial 
order above. 

We show next that the set of minimal strategies has a straightforward characterization for the (e, 5) matrix 
mechanism. Notice that since the sensitivity of a strategy is measured according to its maximum column 
norm, a strategy with some columns that do not match the maximum is wasteful: we could add queries 
containing non-zero entries in the deficient columns without increasing sensitivity. These added queries will 
provide additional evidence that can be used to reduce overall error, so completing deficient columns can 
never result in a worse strategy. 

Definition 3.2 (Column Uniform Matrix). A matrix is column-uniform with respect to L p if each of it 
columns has the same L p norm. 

This suggests that column uniformity is a necessary condition for a minimal strategy. But we can also 
show that it is a sufficient condition. (The proof is included in App. D.l.) 

Theorem 3.1. A strategy matrix A is minimal iff it is column-uniform. 

Another important property of the (e, 8) matrix mechanism is that redundant queries do not lead to less 
efficient strategies. We say a query in strategy matrix A is redundant if it is a linear combination of another 
query in A. In this case the queries provide the same evidence about the database, but one may be scaled 
relative to the other, resulting in lesser or greater accuracy. The following theorem shows that a strategy 
with a redundant queries is equivalent to a strategy with the redundancy removed. 

Theorem 3.2 (Redundant Queries). Suppose strategy A± = {A U qU c x q} for some strategy A , some 
linear query q, and some constant C\. Then A\ is equivalent to the reduced strategy A 2 = {A U c 2 q} where 
C2 = yjl + c\. 

The fact that the presence of redundant queries does not lead to less efficient strategies has important 
consequences for efficient strategy design algorithms. Because of this property, an algorithm can make a local 
choice to add a query to a strategy, adding the same query again later to augment its weight if necessary. 

Thm. 3.1 and Thm. 3.2 do not hold for the e-matrix mechanism. Please see App. B for details. 
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3.2 The singular value bound 



Next we explain our main theoretical result, a lower bound on the error for a workload. Wc first define the 
singular value decomposition of a workload. 

Definition 3.3 (Decomposition of Workload). Let W be any m x n query workload. The singular value 
decomposition (SVD) of W is a factorization of the form A = Q W D W P^, such that Q w is a m x in 
orthogonal matrix, Dw is a m x n diagonal matrix and Pw is a n x n orthogonal matrix. When m > n, 
the diagonal matrix Dw consists of an n x n diagonal submatrix combined with o(" l_ ™) xn . 

The bound on the total error for a workload is derived from its singular value decomposition. 

Theorem 3.3. (Singular Value Bound) Given anmxn workload W, let Ai, A2, . . . , A„ be the singular 
values o/W. 

1 ™ 

minERROR A (W) > P(e,S)-(^2 ^if 

71 i=i 

where P(e,S) = 2lo ^' 5 \ 

A strategy matrix A can be considered to have two parts: its eigenvalues and its eigenvectors. When 
A is column-uniform, the total error can be represented as a polynomial of those two parts. The key idea 
behind the SVD bound is to ignore the constraint that A is column uniform and choose eigenvalues and 
eigenvectors separately to minimize the polynomial, which leads to an under-estimate of the total error. The 
details of this proof are presented in App. D.l. 

We use SVDb(W) as shorthand for the singular value bound of workload W. Given a workload W, 
SVDB(W) can be computed easily using standard methods for matrix decomposition, or it can be computed 
directly from W T W: 

Corollary 3.1. Given an n x n positive semi-definite matrix W T W, let Ai, . . . , A„ be the eigenvalues of 
W T W. 

1 n 

minERROR A (W) > P(e,S)-C^2V\ t ) 2 , (3) 

i=l 

where P(e,S) = 21c 'f /5) . 

For mxn workload W, computing SVDB(W) takes 0(mn 2 ) time on W itself and takes (3(n 3 ) time on 
W T W, which significantly reduces the running time when the number of queries, m, is much larger than 
n. In the case of large regular workloads such as AllRange and AllPredicate, W t W can be computed 
directly. Then computing SVDB(W) improves from 0(n 5 ) to 0(n 3 ) for AllRange and 0(n 2 2") to 0(n 3 ) 
for AllPredicate. 

Example 3.1. For AllRange(1024) and AllRange(32, 32), the SVD bound on the total error is 5.32 x 10 6 
and 4.39 x 10 6 , respectively. Below we report, as a ratio of the SVDB bound, the total error of these workloads 
for a number of known strategy matrices: the workload itself, the identity, hierarchical and wavelet: 





workload 


identity 


hierarchical 


wavelet 


1024 


50.58 


33.75 


2.14 


1.84 


32 x 32 


17.25 


8.15 


2.92 


2.23 



In the next section we present an algorithm that finds better strategies: the ratio of total error on AllRange(1024) 
and AllRange(32, 32) using the strategies computed by the algorithm is 1.26 and 1.08, respectively. 

3.3 Properties of the singular value bound 

The SVD bound has a number of properties which make it a reliable measure of workload complexity. Notice 
that in the expression for ErroRa(W) in Prop. 2.4, the workload W appears only in the trace term, as 
W T W. An immediate implication is that a strategy that achieves minimal total error for a given workload 
Wi achieves minimal error for any workload W2 such that W^Wi = W 2 r W"2. We therefore define the 
following notion of equivalence: 
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Definition 3.4 (Workload Equivalence) . Annxmi workloadWi and annxm2 workloadW 2 are equivalent, 
denoted W 1 =W 2 , #fWfWi = WjW 2 . 

Since the SVD bound is determined by the singular values, it follows immediately that equivalent work- 
loads have equal error bounds. That is, if Wi = W 2 , then SVDb(Wi) = SVDb(W 2 ). In addition, as we 
would hope, the SVD bound of a workload increases monotonically under the addition of new queries. Thus 
if Wi is a workload and W 2 is a workload that results from adding one or more linear queries to the rows of 
Wi, then SVDb(Wi) < SVDb(W 2 ). A more general result is shown below, accounting for the fact that the 
larger workload may be the augmentation of any workload equivalent to the smaller workload. (The proof 
is omitted.) 

Theorem 3.4. Let Wi, W 2 be workloads. If there exists a workload W 2 equivalent to W 2 and the rows of 
W 2 contain all the rows ofW 1 , then SVDB(Wi) < SVDB(W 2 ). 

Lastly, and most importantly, the singular value bound is tight. For a certain set of variable agnostic 
workloads, it is possible to directly construct a strategy achieving the bound. Intuitively, variable agnostic 
workloads treat every variable in the domain in an equivalent manner. The workload AllPredicate(u) 
is variable agnostic, but ALLRANGE(n) is not because, for example, variables in the middle of the domain 
occur more often in the set of all range queries. In App. C we show how to construct the optimal strategy 
for any variable agnostic workload, including for ALLPREDiCATE(n). 

We do not know if the singular value bound is achievable for every workload (namely those that are 
not variable- agnostic). However experimentally we find that we can compute strategies that approach this 
bound. 

4 Strategy Selection Algorithms 

In this section we present an algorithm, along with a set of performance optimizations, for computing a 
close-to-optimal strategy for a given workload. 

4.1 The Level Selection Algorithm (LSA) 

The Level Selection Algorithm (LSA) takes as input a workload and returns a strategy matrix designed to 
offer low error for the workload. LSA is a localized search algorithm which builds a strategy matrix by 
choosing, at each step, the level of queries whose addition maximally reduces error for the workload. A level 
is a set of linear queries consisting of coefficients or 1, determined by a partitioning of the variables, so that 
each variable appears in exactly one query. Recall that, according to Thm. 3.1, minimal strategy matrices 
are precisely those that are column uniform. By constructing a strategy by levels, the LSA algorithm always 
chooses among minimal strategies. 

To construct a new level, the LSA algorithm starts with the simplest level: the query [1, 1, ... 1] which 
is the sum of all cells. The algorithm then performs a top-down search, recursively bisecting the query into 
smaller queries such that the error of the workload is maximally reduced. Once a level is completed, the 
LSA algorithm computes the total error of the workload with and without that level. If the total error is 
not improved by the level, the algorithm terminates and outputs the current strategy. Otherwise the level 
is added to the output strategy, and, subject to a user-defined threshold k on the number of levels, the next 
level is then constructed. 

The search space of LSA is quite general. Both the hierarchical strategies from [9] and (a strategy 
equivalent to) the wavelet strategy [13] can be constructed by levels. But the LSA algorithm is not limited 
to hierarchical strategies because there is no constraint that lower level queries are contained in higher level 
queries. Further, while levels are constructed initially with coefficients of or 1, the final output strategy 
may have a complex set of weights on queries, because redundant queries may be added in different levels. 
These redundant queries can be combined, as shown in Thm. 3.2, by appropriate weighting. 

Example 4.1. For W = AllRange(512), the LSA algorithm terminates at 33 levels. The output strategy 
consists of 4746 queries, reducible to 1931 non-redundant queries. 



8 



Program 4.1 The Level Selection Algorithm (LSA) 

Input: A workload matrix W and the size of each dimension of the domain. An upper bound k to the maximum 

number of levels. 

Output: A strategy matrix A. 

1: A = I; 

2: repeat 

3: ,5a = [1,1,..., 1]; 
4: repeat 

5: for each query q in 5 a do 

6: find a pair such that split at position i on dimension j such that using query: 

qi = {buckets with value 1 in q with 

i-th dimension less than j} 
q2 = {buckets with value 1 in q with 

i-th dimension greater than or equal to j} 

to take the place of query q in <5a and 
A' = [g^] minimizes Error A ' (W). 
7: end for 

8: if ErroRa'(W) is reduced then update 5a, 
9: until 5a was not updated; 

10: if ErroRa'(W) < Errora(W) then A = A'. 
11: until A was not updated or |A||2 > k; 
12: return A. 



4.2 Complexity of the LSA algorithm 

Note that Program 4.1 can use W T W instead of W as input without impacting any computation or the final 
output. This implies that equivalent workloads produce equivalent outputs, and also that the complexity of 
Program 4.1 does not depend on the number of queries in W. 

In each iteration, the algorithm needs to find the split point which maximizes the reduction of total error. 
This occurs in Step 6 and it is the most time-consuming part of Prog. 4.1. The total error for each possible 
split has to be computed, which means computing the total error 0(n) times to determine one split point. 
There are at most n split points so the total error is computed at most 0(n 2 ) over all the iterations. Each 
total error computation requires recomputing (A' T A') for an updated strategy A'. This requires 0(n 3 ) 
time by ordinary matrix inversion, so the entire algorithm would take 0(kn 5 ) time, where the number of 
levels is bounded by k. 

However, because only three rows of 5a are updated during each iteration, we can improve the running 
time by exploiting a more efficient incremental computation of the matrix inverse. Details of this technique 
and the proof of the following theorem are included in App. D.2. 

Theorem 4.1. The LSA algorithm can be implemented in 0(kn 4 ) time, where k is the maximum number 
of levels. 

In experiments we have run the LSA algorithm to termination and found that the number of levels is 
much smaller than n and that it converges quickly. Fig. 1 shows the convergence of the error as a function 
of the bound k on the number of levels for W = AllRange(512). In addition, although the worst case cost 
of the LSA algorithm is 0(kn 4 ), empirically we find that the running time increases approximately with 
0(kn 3 ). 

4.3 The LSA Algorithm on Large Domains 

In the remainder of this section we present two techniques for efficiently scaling the LSA algorithm to larger 
domains. 
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Figure 1: The total error of strategies found by LSA as the bound k on the number of levels increases. 



4.3.1 Workload Separation 

The domain for workloads over multi-dimensional data can increase with the product of the attribute do- 
mains. To avoid this, many common workloads, such as those resulting from sets of low-order marginals, 
can be decomposed so that each dimension can be treated separately. This process of workload separation 
can significantly improve efficiency with little impact on overall error. 

Definition 4.1. Given a workload W, query q is called a separating query for W if there exists a list of sub- 
sets Wi, W2, . . . , Wfc o/W for which the estimate o/W; does not involve queries in Wi, . . . , Wj_i, W,+i, . . . , 
when the answer of query q is fixed. The subsets Wi, W 2 , . . . , W& are called separated workloads o/W under 
query q. 

Given a workload W and one of its separating queries q, notice that answering queries in each separated 
workload does not involve other queries in the workload. Thus workload separation can be applied in two 
phases. In the first phase, the separating query q is answered directly, using the Gaussian mechanism with 
fixed accuracy. In the second phase, the strategies for each separated workload of W arc computed with 
the LSA algorithm. Since each separated workload contains queries with simpler structure, the buckets that 
always appear at the same time in one separated workload can be grouped into a single bucket so as to 
reduce the domain size. 

One of the key applications of workload separation is for workloads consisting of sets of marginal queries 
and predicate queries over marginals, explained in the following theorem. 

Theorem 4.2. Let Wi, . . . , W& be sets of predicate queries over one-way marginals on k different dimen- 
sions. Let q be the sum of all the entries on the domain. With the answer of q given, the estimate of any 
query in Wi does not involve queries in Wi, . . . ,Wj_i,Wj+i, . . . , Wfe. 

More generally, when a workload contains predicate queries over up to fc-way marginals, such as datacube 
queries, workload separation can then be applied recursively, reducing the problem to low- or even one- 
dimensional problems. 

Example 4.2. Consider a query workload consisting of all one- dimensional range queries over each of two 
dimensions, with domains of size n\ and n 2 . Such a workload would typically be represented over a set of 
cell counts of size n\ x n 2; and the LSA algorithm would take (3(£;(nin 2 ) 4 ) time. The separated workloads 
are AllRange(tii) and ALLRANGE(n 2 ) and the cost of running LSA is reduced to 0(k(nf + n 2 )). 

4.3.2 Workload Generalization 

As a second optimization, called workload generalization, we initially merge cells in the domain, creating 
an approximation of the original workload over a much smaller domain. The LSA algorithm is executed 
on this generalized workload to produce an initial generalized strategy. Then, in a second phase, we run a 
slightly modified Program 4.1 over each merged cell. The modified algorithm computes error and sensitivity 
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Figure 2: The SVD Bound for Sampled Workloads 



with the strategy of the first phase taken into account. The strategies on both phases are then combined to 
produce the final strategy. 

The cost of this method is a function of domain size n and the generalized domain size m. The first phase 
runs Prog 4.1 once over domain size m and the second phase runs modified Prog 4.1 m times over domain 
size n/m. Thus, the total cost of workload generalization is 0(k(m 4 + n 4 /m 3 )). If m is set to (^(rj 1 / 3 ), then 
the total cost can be reduced to 0(kn 3 ). This is the asymptotic cost as matrix multiplication, thus it is 
a natural lower bound for the running time of an algorithm of this form. Workload generalization results 
in much faster execution time, but sacrifices solution quality in comparison the LSA algorithm, resulting in 
modestly higher error. We evaluate this experimentally in the next section. 

5 Experimental Evaluation 

There are two parts to the experimental evaluation of our techniques. First, we study the complexity of 
different workloads using the singular value bound. Second, we evaluate the quality of strategies generated 
by the LSA algorithm, as well as the effectiveness of the workload separation and generalization techniques. 
Recall from Sec. 2.3 that the error of the matrix mechanism, and therefore the choice of strategy, is indepen- 
dent of the database instance. As a consequence, the results that follow do not use an input database — they 
are purely an analysis of workloads and the error rates possible in answering these workloads. 

5.1 Workload Complexity 

We can use the singular value bound to gain insight into the complexity of workloads — the essential hardness 
of answering a set of queries under the matrix mechanism. We begin by reporting the SVD bound for all 
range workloads on five different domains, all with a domain size 1024 but different number of dimensions. 
The comparison is shown as the lightest grey bars in Figure 3(a), where 2 10 means the 10-dimensional 
domain with each dimension size equal to 2. As the number of dimensions increases with the overall domain 
fixed, the total number of range queries decreases, leading to lower complexity workloads, as shown by the 
decreasing SVD bound in the figure. 

Our next experiment, shown in Figure 2, samples a matching number of distinct queries from each 
workload and considers the average error of the sampled query sets. The complexity of a sampled query set 
is measured by the ratio between average error of the sample and the average error of the complete query 
set. Moreover, the experiment uses both uniform sampling and a biased sampling method 1 to study whether 
the SVD bound is related to the sampling method. 

1 Biased sampling chooses queries whose center is far away from the center of the domain, and the probability for 
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Figure 3: Performance of the LSA algorithm and auxiliary techniques. 



Figure 2 shows that even very small workloads of uniformly sampled queries (10000 queries, or roughly 
2-6% of the entire workload of range queries) reach the error levels of the entire workload. When biased 
sampling is used to construct workloads, the error ratio approaches 1 more slowly. Overall, this may have 
important consequences for the strategy selection problem. It suggests that, for workloads of just modest 
size, there is no error penalty in adopting a query strategy tuned to a larger regular workload such as 
AllRange for the appropriate domain size. 

5.2 Effectiveness of the LSA Algorithm 

To assess the effectiveness of the LSA algorithm we compare the total error under its strategies with other 
strategies from the literature, and with the optimal error as derived from the SVD bound. The first ex- 
periment, shown in Figure 3(a), explores AllRange workloads over the five multi-dimensional domains 
mentioned above. We compare the LSA-computed strategy with the wavelet strategy [13] and the hierar- 

choosing a range decreases exponentially with the distance between the center of the query and the center of the 
domain. The motivation of having this sampling method is to create a way of sampling that has different properties 
with uniform sampling, which can also be viewed as a process where users whose most interested region is at the 
extreme of the domain so that they tend to generate queries that are far away from the center. 
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chical strategy [9]. Recall that both these strategy matrices were are originally designed for e-diffcrcntial 
privacy, but in fact they perform even better under (e, ^-differential privacy. The hierarchical tree algorithm 
only appears in the first group of comparisons because it is oriented towards one-dimensional queries. 

Both the wavelet and the hierarchical strategies perform well in the one-dimensional case: their er- 
ror is 1.53 times the optimal and 1.78 times the optimal, respectively. The LSA algorithm finds a bet- 
ter strategy, 1.26 times the optimal. In higher dimensions the performance improvement over wavelet is 
much larger, demonstrating the importance of adapting the strategy to the workload. For example, for 
AllRange(16, 8, 8), LSA error is 1.07 times optimal while wavelet is 4.17 times optimal. Overall, the error 
of LSA strategies comes very close to the optimal singular value bound in higher dimensions. 

To further test the benefits of adapting the strategy to the workload, we considered random workloads 
of range queries grouped by average coverage percentage, which is the average percentage of cells that arc 
covered by a query. As shown in Figure 3(b) (note the logarithmic scale), the LSA strategies have error 
rates very close to optimal and outperform the wavelet strategy by more than an order of the magnitude in 
the most extreme case. 

Next we evaluate the workload separation and workload generalization techniques for improved efficiency 
of the LSA algorithm on large domains. Figure 3(c) shows the workload separation technique on range 
queries over pairs of one-way marginals with increasing domain sizes. Using LSA with workload separation 
has a negligible penalty in error: it is almost identical to the original LSA (and the optimal SVD bound), 
yet the computational cost is improved by more than three orders of magnitude: On domain 32 • 32, the 
running time is reduced from 5079 seconds to merely 2.44 seconds. 

To evaluate the workload generalization techniques we considered skew-sampled workloads on one di- 
mension, since their properties differ from AllRange workloads. Figure 3(d) compares the performance 
of LSA generalized on domain size n 1 / 3 , the original LSA algorithm, the wavelet strategy, and the singu- 
lar value bound. Generalized LSA performs almost as well as the original LSA algorithm. However, the 
computational cost is decreased significantly: for domain 1024, the cost is reduced by a factor of 18. 

6 Related Work 

The matrix mechanism [10] analyzed in a unified framework two prior techniques for accurately answering 
range queries. The first used a wavelet transformation [13]; the second used a hierarchical set of queries 
followed by inference [9]. The original work on the matrix mechanism focused primarily on e-diffcrential 
privacy, although (e, <5)-differential privacy was considered briefly. In both cases, it was shown that semi- 
definite programming could be used to compute a minimum error strategy for a workload, but solving such 
programs is not infcasible. The present work provides tractable methods for computing low-error strategies 
for any given workload. 

Workload complexity under e-differential privacy has been studied before. Hardt and Talwar [8] present a 
lower bound for randomly generated predicate workloads. More generally, Blum et al. [1] show that workload 
complexity is related to the VC dimension of the workload. This measure is not directly comparable to our 
singular value bound since the former concerns e-differential privacy. Also, the VC dimension of any one- 
dimensional range workload is constant, while our measure captures more detailed differences in workloads. 

Put in our terms, Xiao et al. [14] propose a method for computing a strategy matrix using KD-trecs 
and private accesses to the database. Their query answers could be improved and made consistent by 
employing the matrix mechanism. Roth and Roughgarden [12] describe a data-dependent mechanism to 
answer predicate queries on databases with 0-1 entries. Hardt et. al [7] provide a linear time algorithm for 
the same query and database setting. The trade-off between accuracy and efficiency among data-dependent 
and data-independent methods deserves further investigation. 
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7 Conclusion 



Standard differentially-private mechanisms for answering a workload of queries require noise determined 
by the sensitivity of the queries. We have shown that it is possible to satisfy the privacy condition with 
substantially less noise, and that the noise required is related to the spectral properties workload. Our 
methods allow the privacy mechanism to be efficiently adapted to the workload, achieving error improvements 
of as much as one order of magnitude over prior techniques. 
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A Data model and Domain 



In this section we provide a brief concrete example to explain how a workload of counting queries can be 
expressed as a set of linear queries over a vector of cell counts x. Consider the following relational schema 
describing students: 

R = (name, gradyear, gender, gpa) 

and suppose dom(gradyear) — {2011, 2012, 2013, 2014} and dom(gender) = {M, F}. If the desired workload 
consists of statistics about the gender of students graduating in specified years, then we can define the cells 
of x as the crossproduct of dom(gradyear) and dom(gender) , which has size n = 8. That is, 

x = [cni(2011, M), crrf(2011, F), . . . cni(2014, M),cnt(20U, F)] 

Then the following matrix represents a workload of five queries: 



W = 
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By rows, the queries represented by W are: Q\. the count of all students; Qi'. the count of students with 
gradyear G [2011,2012]; Q 3 : the count of female students with gradyear e [2011,2012]; Q 4 : the count of 
male students with gradyear € [2011,2012]; Q 5 : the difference between 2013 grads and 2014 grads. 

Note that each query of interest should be listed in the workload. We do not omit queries whose answers 
could be calculated from other queries in the workload. For example, Q2 is included in the workload even 
though it could be computed by summing Q3 and Q4. Noise accumulates when summing noisy query answers, 
so we include each desired query and minimize the total error. Also, the relative accuracy of a query in the 
workload can be controlled by linearly scaling its row by a coefficient. 

For this domain, the workload of all range queries is written AllRange(4, 2) and consists of all two 
dimensional range queries over gradyear and gender, where the possible "ranges" for gender are simply M, 
F, or (M V F). This workload consists of 30 queries. The workload AllPredicate(8) includes all 2 8 linear 
queries expressed over x with coefficients in {0, 1}. 

Also note that for the workload W above, the pair of elements x$, xq and xy, x% always appear together. 
The included queries do not distinguish between males and females in year 2013, or in year 2014. For this 
reason, each pair of variables could be replaced by a single variable, reducing the vector x to size 6. 

B Comparison of the e- and (e, 5)-Matrix Mechanisms 

In this appendix we consider how our main results, which apply to approximate differential privacy compare 
to corresponding results under standard differential privacy. We begin with definitions for the e-matrix 
mechanism in Sec. B.l. Then, in Sec. B.2, we describe key differences that make the strategy selection 
problem harder under e-differential privacy. While the privacy guarantees of the two mechanisms are formally 
distinct, for conservative settings of 5, one may be indifferent to the two guarantees and consider which 
mechanism offers lower error for a fixed e. In Sec. B.3 we show that the error of the (e, <5)-matrix mechanism 
is favorable for a wide range of strategies including those considered in this paper. 

B.l Definitions, e-Matrix Mechanism 

Standard differential privacy is defined as follows: 

Definition B.l (Differential Privacy). A randomized algorithm K, is e- differentially private if for any in- 
stance I, any V <G nbrs(I), and any subset of outputs S C Range(]C) , the following holds: 

Pr[K(I) eS}< cxp(e) x Pr[K{I') e S], 
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Figure 4: Strategy matrices for domain n = 4. H is the hierarchical strategy. Y is the wavelet strategy. 
Yi has redundant queries which are reduced in Y2. Y,Yi,Y2 are all equivalent under the (e, <5)-matrix 
mechanism. Under the e-matrix mechanism, Y 2 is strictly more efficient than Y x because ||"V 2 1| j < HY^^. 



Under e-differcntial privacy, query sensitivity is measured using the L\ distance. For a query matrix W, 
the L\ sensitivity is the maximum L\ norm of the columns of W. 

Proposition B.l (Li Query matrix sensitivity). The L\ sensitivity of a query matrix W is denoted HW^ 
and is defined as follows: 

IIWHi = f max ||Wx — Wx'|j = max 1 1 Wj 1 1 1 

x>£nbrs(x) W t £Cols(W) 

The standard mechanism for achieving e-differential privacy adds Laplace noise calibrated to the L\ 
sensitivity. We use Laplace(&) m to denote a column vector consisting of m independent samples drawn from 
a Laplace distribution with mean and scale b. 

Proposition B.2 (Laplace mechanism). Given anmxn query matrix W, the randomized algorithm C that 
outputs the following vector is e- differentially private: 

IIWII 

£(W, x) = Wx + LaplaceC—-^-) m 

The matrix mechanism is defined almost identically to Prop. 2.2, but with Laplace noise in place of 
Gaussian noise. 

Proposition B.3. (e-MATRlX MECHANISM [10]) Let A be a full rank m x n strategy matrix and let W 
be any p x n workload matrix. Then the randomized algorithm Ma that outputs the following vector is 
e- differentially private: 

AU(W,x) = Wx + WA+ Laplace(b) m . 

where b = || /e 

The analysis of error for the e-matrix mechanism differs only in the sensitivity and e terms: 

Proposition B.4. (Total Error) Given a workload W , the total error of answering W using the e matrix 
mechanism with query strategy A is: 

ErroRa(W) = 4ll A lli trace(W T W(A T A)- 1 ) (4) 
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B.2 Main Distinctions 



The analysis of error differs between the two mechanisms because of the difference in sensitivity metrics. 
From Prop. B.4 and Prop. 2.4 the two expressions for error can be compared (ignoring privacy parameters): 

e-ERROR A (W) oc ||A||f trace(W T W(A T A)- 1 ) 

(e,5)-ERROR A (W) oc ||A||| trace(W T W(A T A)- 1 ) 

The trace terms are identical, so these expressions differ only in the sensitivity metric applied to A. The 
implications of this small difference are significant for optimizing the expressions, primarily because ||A||2 
is uniquely determined by A T A (it is in fact the largest diagonal entry of matrix A T A). This means that 
whenever two different strategies A and B are such that A T A = B T B, then under (e, <5)-matrix mechanism 
they have equivalent error, so A = B. But HA^ is not determined by A T A, so it is possible to have 
A T A = B T B, while HB^ < ||A|| r Then it follows that under e-matrix mechanism A and B are not 
equivalent. Instead strategy B is strictly more efficient than A. 

This difference in invalidates many results that hold for the (e, <5)-matrix mechanism. First of all, the 
optimal strategy under (e, <5)-matrix mechanism can be found via a semi-definite programming (SDP) that 
computes the matrix A T A to minimize the total error formula. The solution to this program is insufficient, 
however, under the e-matrix mechanism, because we need to compute an strategy A with minimum L\ 
sensitivity. To do so, extra non-convex constraints must be added which requires solving a non-convex 
extension of an SDP (an SDP with rank constraints) to get an optimal strategy. 

Further, the singular value bound does not hold under the e-matrix mechanism and the properties of 
redundant queries under e-matrix mechanism change. Unlike Thm. 3.2, the presence of redundant queries 
leads to unneeded error: 

Theorem B.l (Redundant Queries). Suppose strategy Ai = {A U qU ciq} for some strategy A , some 
linear query q, and some constant C\ . Then, under e-matrix mechanism, the reduced strategy A 2 = {A Uc 2 q} 
where c 2 = yT-j-cf is strictly more efficient than Ai. 

Fig. 4 shows an example of a strategy matrix with redundant queries (Yi) and a reduced strategy with 
lower Li sensitivity. 

Finally column uniformity no longer characterizes the minimal strategies (as in Thm. 3.1) and the quality 
of the output of the LSA algorithm tends to be lower. Intuitively, in the e-matrix mechanism, the algorithm 
only has one chance to add a query to the strategy and needs to choose the correct weight before knowing 
how the remainder of the strategy will be built. 

B.3 Error Comparison 

Suppose we fix e in both mechanisms and choose S — 2/n 2 in (e, £)-matrix mechanism, where n is the domain 
size. This is a conservative choice for S, which results in error of the (e, <5)-matrix mechanism equal to: 

Errora(W) = ^(logi nP|| 2 ) 2 trace(W T W(A T A)- 1 ). 

Comparing this equation with Eq. (4), indicates that the (e, ft) mechanism introduces less error whenever 
||A||i > log 3 n||A|| 2 . In particular, for strategy matrices A which consists of predicate queries (such as 
the output of LSA algorithm, hierarchical strategies [9] and (a strategy equivalent to) the wavelet strategy 
[13]), ||A|| 2 = ^/||A||i. Using such a strategy matrix A, when ||A||i > y/\ogn, the (e, <5)-matrix mechanism 
provides less error than e-matrix mechanism. 

C The Optimal Strategy for Variable Agnostic Workloads 

To demonstrate that the singular value bound is a tight lower bound, we discuss a special type of workloads 
called the variable agnostic workloads and construct a strategy to such workloads that introduce error as 
low as the singular value bound. 
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Definition C.l (Variable agnostic workload). A 

workload W is variable agnostic if W T W is unchanged when we swap two columns o/W. 

For any variable agnostic workloads W, W T W has the following form, for constants a and b: 
The next corollary gives a strategies which work on variable agnostic workloads with size n and have the 
total error equal to the the singular value bound. 

Theorem C.l. For positive integer k and n — 2 k , let W be any m x n variable-agnostic workload. Then 
W T W has the following form, for constants a and b: 

i b ... b 
b a ... b 



b b ... a _ 

and there exists a strategy A which attains the singular value bound which is SVDb(W) = \{-\fa + (n — l)b + 
{n-l)^a^bf. 

Proof. Let Qi = [{ } x ] and Qfc = q*~* Jq~^ ■ Here we prove the following result: Qk is an eignvector 

matrix for W^Wfc where Wfc is any 2 k x 2 k variable-agnostic workload. The eigenvalue corresponds to the 
first column of Qfc is a + (2 fe — 1)6 and the eigenvalue corresponds to all the other columns is a — b, where a 
is the diagonal entry of WjT Wfc and b is the off diagonal entry of W^T Wfc . 

One can verify the result is true when k = 1. Now suppose the result is true for k — 1. Then 



W T WQ fc 



a+(2 k 




Wfc_! 



6J 2 fc-i 
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Qfc i 
Qfc-i 



Q fc i 
—Qfc i 



1)6 





a — b 



Qfc 



(5) 



... a-b 

Here J is the matrix whose all entries are 1. The computation of Eq. 5 uses the fact that the sum of first 
column of Q fe is 2 k and the sum of any other column of Qfc is 0. In addition, since, 

1 . 



QlQ k 



Qfc i Qfc i 
Qfc i Qfc i 
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—Qfc i 
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2 fc- 



we know Q fc is a eigenvector matrix for W T W. 

From above, we know the eigenvalues and a set of corresponding eigenvectors of W T W. According to 
the proof of Thm. 3.3, the SVDB(W) can be achieved if and only if 



\/a + (2 fc 




1)6 





y/a — b 







\/a — b 



is a column-uniform matrix. Notice Q s k 



J 2 fc, according to Thm. D.l, the matrix above is column-uniform. 

□ 



The total errors of using the identity matrix or the workload itself as the strategy matrix arc both na. 
Compared with those total errors, the ratio of total error reduced by using the strategy in Thm. C.l is 
approximately 1 — |. 

Since the workload AllPredicate(7i) is variable agnostic, a consequence of the above theorem is that 
we can find its optimal strategy. 

Corollary C.l. For n = 2 k , the minimized total error for the workload AllPredicate(ti) is equal to the 
singular value bound, which is (n — 1 + \/n + l) 2 . 
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D Proof of Main Theorems 



This section contains proofs of the most important theorems in Section 3 and 4. 



D.l Analysis of Minimal Strategies 



Here we complete the proof characterizing minimal strategies and continue the discussion in order to prove 
the singular value bound. 

Theorem 3.1. A strategy matrix A is minimal iff it is column-uniform. 

Proof. If a strategy Ai is not column-uniform, the strategy A2 that is more efficient Ai can be found 
according to the augmenting strategy theorem from [10]. Here we only show the proof that any column- 
uniform matrix is minimal under the partial order. 

Given two column uniform strategies Ai and A 2 such that Ai < A 2 . Without loss of generality, 
we assume both of them have L 2 sensitivity 1, which means all diagonal entries of A^Ai and A 2 r A 2 
are 1. Since Ai < A 2 , for any query w, ErroRa! (w) < ErroRa 2 ( w )- According to the definition, 
ErroRaj (w) = w T A^A!W, ErroRa 2 (w) = w T A 2 r A 2 w. Therefore, 

Errora! (w) - ErroRa 2 (w) = w T Af Aiw - w T A 2 r A 2 w 

= w T (A^ Ai — A 2 r A 2 )w 

< (6) 
Since (6) is true for arbitrary w, A 2 r A 2 — A^Ai is a positive semi-definite matrix. In addition, as the 
assumption, the diagonal entries of AjAi and A 7 A 2 are 1. Therefore the diagonal entries of A 2 r A 2 — AjAi 
are all 0. According to properties of positive semi-definite matrix, there is an unique semi-definite matrix 
whose diagonal entries are all 0, which is the matrix. Thus we have Ai = A 2 . □ 

As an application of Thm. 3.1, we have the following theorem. 

Theorem D.l. If an m x n strategy matrix A connects to an optimal solution of Problem 2.1, A e U mxn , 
which is also equivalent to 

1 



P1AJ 



k 



(7) 



Here A = QaAaPa is the singular decomposition of A, vector Aa is the vector consists of the singular 
values of A, k is a real number and A s represents the element-wise square of a matrix A. 

For any strategy matrix A, one can find an n x n matrix B such that A T A = B T B by decomposing 
matrix A T A. It is follows that ErroRa(W) = ErroRb(W) and therefore it is sufficient to consider only 
nx n strategy matrices in Problem 2.1 .Thus (1) can be represented using Frobenius norms, which is defined 
as following. 



Definition D.l. (Frobenius Norm) Given anmxn matrix A 
is defined as the square root of the sum of squares of its elements 



{aij}, the Frobenius norm of matrix A 



\ EE4- 



It is also equal to the square root of the trace of matrix A 7 A 

\\A\\ F = yVace(A T A). 
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According to the definition of Frobenius norm, assume strategy matrix A is a square matrix, (2) can be 
rewritten as follows: 

min ||A||^trace((A T A)" 1 W T W) 

A 

= nun||A||^trace(WA- 1 (A- 1 ) T W T ) 
= minllAllSlllWA- 1 !^. 



Lemma 1. If a workload W consists of two sets of independent queries, i.e. there exists an orthogonal 
matrix P such that 

" Wi 

w 2 



PW 



A = 



the strategy matrix A that minimizes total error has form 

Ai 
A 2 

where A 1; A 2 are the solutions to problem 2.1 for W 1; W 2; respectively. 

Proof. Given a strategy matrix A, consider the cholesky decomposition of A T A, which is in form 



A' 



Ai 





A 3 
A 2 



Then 



WA' 



Wi 

w 2 

WiA- 1 




Ar 1 





-A^AgAj 1 



-WiA- 1 A 3 A 2 
WsA. 1 



Let strategy matrix A d be 



Notice that 



IWA' 



/ — 1 1 12 



If — 

< 

< 



Ai 
A 2 



|WiAr 1 |||. + ||W 2 A^ 1 |||.- 
|W 1 Ar 1 |||. + ||W 2 A^ 1 |||. 



iWiAj-^sAa 
IIWA^HI 



1 1 1 2 
\\F 



,'l|2 



max{||A 1 ||i, 



IA 



2 M 2 



2 } = 



|A d ||i 



we know ErroRa^ (W) < ErroRa(W). Thus the strategy matrix that minimizes (1) also has the same 
form as A^. Furthermore, if Ai is not an optimal strategy for workload Wi, let A* be an optimal strategy 
for Wi. Notice ||^«||' A* is also an optimal strategy for Wi and substitute Ai by jj^jj* A* in A^ will bring 
down the Frobenius norm without boosting ||Aj||2 so that the total error becomes smaller. Thus, if A^ is 
an optimal strategy for W, A 1; A 2 must be the solutions to problem 2.1 for Wi, W 2 , respectively. □ 

Theorem 3.3. (Singular Value Bound) Given anmx n workload W, let Ai, A 2 , . . . , A„ be the singular 
values ofW. 



mm 
A 



1 

Errora(W) > P(e,S)-(Y j Ai) 2 , 



where P{e,5) = 2lo ^' 5 \ 

Proof. For a given workload W, according to Thm. D.l, its optimal strategy matrix A satisfies AeW„ 
Therefore, 

||A|| 2 = -trace(A). 
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Let W = QwAwPw and A = QaAaPw be the singular decomposition of W and W, respectively, we 
have: 

ErroRa(W) 

min ||A|| 2 trace(W(A T A)- 1 W T ) 

Aeu nx „ 

= 1 min tracc(A)tracc((A T A)- 1 W' r W) 

n Ae«„ x „ 

= — min tracefA 2 ,) 
n (A A p A )ew„ x „ 

• trace(P A (A A 1 ) 2 P A PwAwPw) 

— min trace(Ai ) 
~ n a a ,p a 

• trace(Aw(PwPA) T (A A 1 ) 2 (P^ v P A )Aw) (8) 
As the proof of Lemma 1, the minimum of (8) achieved when P^P A is a diagonal matrix. Thus P A = Pw 
and 

(8) = — mintrace(A A )trace(A 2 A r(A A 1 ) 2 ) 



n a a 



> 



1 n 

71 * J 



(9) 



The equal sign in (9) is satisfied if and only if A A = \/Aw. Notice the inequality in (8) comes from 
removing the constraint that (A A P A ) e U nxn , to satisfied the equal signs in (8) and (9) simultaneously, we 
need VA\v?w G W„ xn - □ 



D.2 The LSA Algorithm and Extensions. 

Lastly, we prove the complexity of the LSA algorithm and the basis of the workload separation technique. 

Lemma 2. (Sherman-Morrison-Woodbury formula[6]) Given n x n matrix X, n x m matrix U, 
m x m matrix A and m x n matrix V, 



(X - UAV)" 1 = X" 1 - X^U(A + UX^V) VX" 1 (10) 

Using this lemma, the test of each split point in step 6 of Program 4.1 can be done in 0(n 2 ) time, 
resulting in the following overall running time: 

Theorem 4.1. The LSA algorithm can be implemented in 0(kn 4 ) time, where k is the maximum number 
of levels. 

Proof. Notice in step 6 of Program 4.1, each time we compute the total error of a possible split of row 
i>2, • • • , v n ] on position i, it is equivalent to modify matrix A' by removing this selected row and add the 
following two rows to matrix A'. 

v\ v' ... v' 



J 2 

d" 
v 2 



*4,<e{o,i}X + < = i, l<i<n. 



To apply Lemma 2, let X = A' A' and U, A, V be the following matrices: 





Vl 


V2 ■ 






" -1 





" 


U = V = 




v' 2 ■ 


■ < 


,A = 





1 









v'i ■ 


■ < _ 










1 



Denote the modified matrix as A" and we can verify that A" 1 A" = A' 1 A' — UAV = X UAV. 
Therefore the inverse of A" T A" can be computed as the Eq. (10). Since U and V are n x 3 and 3 x n 
matrices respectively, the inverse of A" T A" can be computed in 0(n 2 ) time by first computing X _1 U, 
UXV and VX _1 and then finishing the evaluation from left to right. □ 
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Theorem 4.2. Let Wi, . . . , be sets of predicate queries over one-way marginals on k different dimen- 
sions. Let qo be the sum of all the entries on the domain. With the answer of qo given, the estimate of any 
query in Wi does not involve queries in Wi, . . . , Wj_i, Wj+i, . . . , Wfc. 

Proof. For any i, 1 < i < k and a query q e Wj. To estimate q, one needs to find a set of queries whose 
linear combination is equal to q . Let the representation be 

l h 

q = ^a j q 7 - + ^/? jPj , 
where q^ € Wj, 1 < j < I and p^- € U™=i 1 — 3 — h- The equation above is equivalent to 

l h 

Notice the left hand side of Eq. (11) is the sum of one-way marginal queries on i-th dimension and the right 
hand side of Eq. (11) is the sum of one-way marginal queries that are not on i-th dimension. Since the only 
query that shared by the one-way marginal queries on i-th and not on i-th dimension is the total sum q , 

q- X^ q -? = a ° q °' 



where ag is a constant. Since the answer of qo is given, estimate with aoqo always have better accuracy 
than with the combination of X^=i PjPj (which are noisy answers). Therefore estimating q only relates to 
queries in W; and q but not queries in Wi, . . . , W,_i,W i+ i, . . . , W^. □ 
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